Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add filters

Document Type
Year range
1.
arxiv; 2023.
Preprint in English | PREPRINT-ARXIV | ID: ppzbmed-2309.06503v1

ABSTRACT

The COVID-19 pandemic has presented significant challenges to the healthcare industry and society as a whole. With the rapid development of COVID-19 vaccines, social media platforms have become a popular medium for discussions on vaccine-related topics. Identifying vaccine-related tweets and analyzing them can provide valuable insights for public health research-ers and policymakers. However, manual annotation of a large number of tweets is time-consuming and expensive. In this study, we evaluate the usage of Large Language Models, in this case GPT-4 (March 23 version), and weak supervision, to identify COVID-19 vaccine-related tweets, with the purpose of comparing performance against human annotators. We leveraged a manu-ally curated gold-standard dataset and used GPT-4 to provide labels without any additional fine-tuning or instructing, in a single-shot mode (no additional prompting).


Subject(s)
COVID-19
2.
arxiv; 2021.
Preprint in English | PREPRINT-ARXIV | ID: ppzbmed-2107.12565v1

ABSTRACT

The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the COVID-19 pandemic, researchers have turned to more nontraditional sources of clinical data to characterize the disease in near real-time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present (Long-COVID). However, manually curated social media datasets are difficult to come by due to the expensive costs of manual annotation and the efforts needed to identify the correct texts. When datasets are available, they are usually very small and their annotations do not generalize well over time or to larger sets of documents. As part of the 2021 Biomedical Linked Annotation Hackathon, we release our dataset of over 120 million automatically annotated tweets for biomedical research purposes. Incorporating best practices, we identify tweets with potentially high clinical relevance. We evaluated our work by comparing several SpaCy-based annotation frameworks against a manually annotated gold-standard dataset. Selecting the best method to use for automatic annotation, we then annotated 120 million tweets and released them publicly for future downstream usage within the biomedical domain.


Subject(s)
COVID-19
3.
medrxiv; 2021.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2021.07.13.21260449

ABSTRACT

As the SARS-CoV-2 virus (COVID-19) continues to affect people across the globe, there is limited understanding of the long term implications for infected patients. While some of these patients have documented follow-ups on clinical records, or participate in longitudinal surveys, these datasets are usually designed by clinicians, and not granular enough to understand the natural history or patient experiences of "long COVID". In order to get a complete picture, there is a need to use patient generated data to track the long-term impact of COVID-19 on recovered patients in real time. There is a growing need to meticulously characterize these patients' experiences, from infection to months post-infection, and with highly granular patient generated data rather than clinician narratives. In this work, we present a longitudinal characterization of post-COVID-19 symptoms using social media data from Twitter. Using a combination of machine learning, natural language processing techniques, and clinician reviews, we mined 296,154 tweets to characterize the post-acute infection course of the disease, creating detailed timelines of symptoms and conditions, and analyzing their symptomatology during a period of over 150 days.


Subject(s)
COVID-19 , Infections
4.
medrxiv; 2021.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2021.03.02.21252763

ABSTRACT

BackgroundThe low testing rates, compounded by reporting delays, hinders the estimation of the mortality burden associated with the COVID-19 pandemic based on surveillance data alone. A more reliable picture of the effect of COVID-19 pandemic on mortality can be derived by estimating excess deaths above an expected level of death. In this study we aim to estimate the absolute and relative mortality impact of COVID-19 pandemic in Mexico in 2020 by gender and two geographic regions: Mexico City and the rest of the country. MethodsWe obtained mortality time series due to all causes for Mexico, and by gender, and geographic region using epidemiological weeks from January to December 2020 and for preceding 5 years. We also compiled data on COVID-19 related morbidity and mortality to assess the timing and intensity of the pandemic in Mexico. We assembled weekly series of the number of tweets about death from Mexico to assess the correlation between peoples media interaction about death and the rise in pandemic deaths. We estimated all-cause excess mortality rates and mortality rate ratio increase over baseline by fitting Serfling regression models. ResultsThe COVID-19 pandemic excess mortality rates per 10,000 population in Mexico between March 1, 2020 and January 2, 2021 was estimated at 26.10. The observed total number of deaths due to COVID-19 was 128,886 which is 38.64% of the total estimated excess deaths. Males had about 2-fold higher excess mortality rate (33.99) compared to females (18.53). The excess mortality rate for Mexico City (63.54) was about 2.7-fold higher than the rest of the country (23.25). Similarly, the mortality rate ratio relative to baseline was highest for Mexico City (RR: 2.09). There was no significant correlation between weekly number of tweets on death and the weekly all-cause excess mortality rates ({rho}=0.309 (95% CI: 0.010, 0.558, p-value=0.043). ConclusionThe excess mortality rate of 26.10 per 10,000 population corresponds to a total of 333,538 excess deaths in Mexico between March 1, 2020 to January 2, 2021. COVID-19 accounted for only 38.21% of the total excess deaths, which reflects either the effect of low testing rates in Mexico, or the surge in number of deaths due to other causes.


Subject(s)
COVID-19
5.
researchsquare; 2021.
Preprint in English | PREPRINT-RESEARCHSQUARE | ID: ppzbmed-10.21203.rs.3.rs-279400.v1

ABSTRACT

Background: Routinely collected real world data (RWD) have great utility in aiding the novel coronavirus disease (COVID-19) pandemic response [1,2]. Here we present the international Observational Health Data Sciences and Informatics (OHDSI) [3] Characterizing Health Associated Risks, and Your Baseline Disease In SARS-COV-2 (CHARYBDIS) framework for standardisation and analysis of COVID-19 RWD.Methods: We conducted a descriptive cohort study using a federated network of data partners in the United States, Europe (the Netherlands, Spain, the UK, Germany, France and Italy) and Asia (South Korea and China). The study protocol and analytical package were released on 11th June 2020 and are iteratively updated via GitHub [4]. Findings: We identified three non-mutually exclusive cohorts of 4,537,153 individuals with a clinical COVID-19 diagnosis or positive test, 886,193 hospitalized with COVID-19, and 113,627 hospitalized with COVID-19 requiring intensive services. All comorbidities, symptoms, medications, and outcomes are described by cohort in aggregate counts, and are available in an interactive website: https://data.ohdsi.org/Covid19CharacterizationCharybdis/. Interpretation: CHARYBDIS findings provide benchmarks that contribute to our understanding of COVID-19 progression, management and evolution over time. This can enable timely assessment of real-world outcomes of preventative and therapeutic options as they are introduced in clinical practice.


Subject(s)
COVID-19 , Coronavirus Infections , Leishmaniasis, Cutaneous
6.
arxiv; 2021.
Preprint in English | PREPRINT-ARXIV | ID: ppzbmed-2102.06836v2

ABSTRACT

The rapid evolution of the COVID-19 pandemic has underscored the need to quickly disseminate the latest clinical knowledge during a public-health emergency. One surprisingly effective platform for healthcare professionals (HCPs) to share knowledge and experiences from the front lines has been social media (for example, the "#medtwitter" community on Twitter). However, identifying clinically-relevant content in social media without manual labeling is a challenge because of the sheer volume of irrelevant data. We present an unsupervised, iterative approach to mine clinically relevant information from social media data, which begins by heuristically filtering for HCP-authored texts and incorporates topic modeling and concept extraction with MetaMap. This approach identifies granular topics and tweets with high clinical relevance from a set of about 52 million COVID-19-related tweets from January to mid-June 2020. We also show that because the technique does not require manual labeling, it can be used to identify emerging topics on a week-to-week basis. Our method can aid in future public-health emergencies by facilitating knowledge transfer among healthcare workers in a rapidly-changing information environment, and by providing an efficient and unsupervised way of highlighting potential areas for clinical research.


Subject(s)
COVID-19
7.
medrxiv; 2021.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2021.01.11.21249561

ABSTRACT

Mexico has experienced one of the highest COVID-19 death rates in the world. A delayed response towards implementation of social distancing interventions until late March 2020 and a phased reopening of the country in June 2020 has facilitated sustained disease transmission in the region. Here, we systematically generate and compare 30-day ahead forecasts using previously validated growth models based on mortality trends from the Institute for Health Metrics and Evaluation for Mexico and Mexico City in near real-time. Moreover, we estimate reproduction numbers for SARS-CoV-2 based on methods that rely on genomic data as well as case incidence data. Subsequently, functional data analysis techniques are utilized to analyze the shapes of COVID-19 growth rate curves at the state level to characterize the spatial-temporal transmission patterns. The early estimates of reproduction number for Mexico were estimated between R[~]1.1-from genomic and case incidence data. Moreover, the mean estimate of R has fluctuated [~]1.0 from late July till end of September 2020. The spatial analysis characterizes the state-level dynamics of COVID-19 into four groups with distinct epidemic trajectories. We found that the sequential mortality forecasts from the GLM and Richards model predict downward trends in the number of deaths for all thirteen forecasts periods for Mexico and Mexico City. The sub-epidemic and IHME models predict more realistic stable trajectory of COVID-19 mortality trends for the last three forecast periods (09/21-10/21 - 09/28-10/27) for Mexico and Mexico City. Our findings support the view that phenomenological models are useful tools for short-term epidemic forecasting albeit forecasts need to be interpreted with caution given the dynamic implementation and lifting of social distancing measures.


Subject(s)
COVID-19
8.
medrxiv; 2020.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2020.07.29.20164418

ABSTRACT

As the COVID-19 virus continues to infect people across the globe, there is little understanding of the long term implications for recovered patients. There have been reports of persistent symptoms after confirmed infections on patients even after three months of initial recovery. While some of these patients have documented follow-ups on clinical records, or participate in longitudinal surveys, these datasets are usually not publicly available or standardized to perform longitudinal analyses on them. Therefore, there is a need to use additional data sources for continued follow-up and identification of latent symptoms that might be underreported in other places. In this work we present a preliminary characterization of post-COVID-19 symptoms using social media data from Twitter. We use a combination of natural language processing and clinician reviews to identify long term self-reported symptoms on a set of Twitter users.


Subject(s)
COVID-19 , Hallucinations
9.
arxiv; 2020.
Preprint in English | PREPRINT-ARXIV | ID: ppzbmed-2007.10276v2

ABSTRACT

Since the classification of COVID-19 as a global pandemic, there have been many attempts to treat and contain the virus. Although there is no specific antiviral treatment recommended for COVID-19, there are several drugs that can potentially help with symptoms. In this work, we mined a large twitter dataset of 424 million tweets of COVID-19 chatter to identify discourse around drug mentions. While seemingly a straightforward task, due to the informal nature of language use in Twitter, we demonstrate the need of machine learning alongside traditional automated methods to aid in this task. By applying these complementary methods, we are able to recover almost 15% additional data, making misspelling handling a needed task as a pre-processing step when dealing with social media data.


Subject(s)
COVID-19
10.
medrxiv; 2020.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2020.04.22.20074336

ABSTRACT

Background In this study we phenotyped individuals hospitalised with coronavirus disease 2019 (COVID-19) in depth, summarising entire medical histories, including medications, as captured in routinely collected data drawn from databases across three continents. We then compared individuals hospitalised with COVID-19 to those previously hospitalised with influenza. Methods We report demographics, previously recorded conditions and medication use of patients hospitalised with COVID-19 in the US (Columbia University Irving Medical Center [CUIMC], Premier Healthcare Database [PHD], UCHealth System Health Data Compass Database [UC HDC], and the Department of Veterans Affairs [VA OMOP]), in South Korea (Health Insurance Review & Assessment [HIRA]), and Spain (The Information System for Research in Primary Care [SIDIAP] and HM Hospitales [HM]). These patients were then compared with patients hospitalised with influenza in 2014-19. Results 34,128 (US: 8,362, South Korea: 7,341, Spain: 18,425) individuals hospitalised with COVID-19 were included. Between 4,811 (HM) and 11,643 (CUIMC) unique aggregate characteristics were extracted per patient, with all summarised in an accompanying interactive website (http://evidence.ohdsi.org/Covid19CharacterizationHospitalization/). Patients were majority male in the US (CUIMC: 52%, PHD: 52%, UC HDC: 54%, VA OMOP: 94%,) and Spain (SIDIAP: 54%, HM: 60%), but were predominantly female in South Korea (HIRA: 60%). Age profiles varied across data sources. Prevalence of asthma ranged from 4% to 15%, diabetes from 13% to 43%, and hypertensive disorder from 24% to 70% across data sources. Between 14% and 33% were taking drugs acting on the renin-angiotensin system in the 30 days prior to hospitalisation. Compared to 81,596 individuals hospitalised with influenza in 2014-19, patients admitted with COVID-19 were more typically male, younger, and healthier, with fewer comorbidities and lower medication use. Conclusions We provide a detailed characterisation of patients hospitalised with COVID-19. Protecting groups known to be vulnerable to influenza is a useful starting point to minimize the number of hospital admissions needed for COVID-19. However, such strategies will also likely need to be broadened so as to reflect the particular characteristics of individuals hospitalised with COVID-19.


Subject(s)
Diabetes Mellitus , Hypertension , COVID-19
11.
arxiv; 2020.
Preprint in English | PREPRINT-ARXIV | ID: ppzbmed-2004.03688v2

ABSTRACT

As the COVID-19 pandemic continues its march around the world, an unprecedented amount of open data is being generated for genetics and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated in the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique world-wide event into biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 152 million tweets, growing daily, related to COVID-19 chatter generated from January 1st to April 4th at the time of writing. This open dataset will allow researchers to conduct a number of research projects relating to the emotional and mental responses to social distancing measures, the identification of sources of misinformation, and the stratified measurement of sentiment towards the pandemic in near real time.


Subject(s)
COVID-19
SELECTION OF CITATIONS
SEARCH DETAIL